An Empirical Study of Arabic Formulaic Sequence Extraction Methods
نویسندگان
چکیده
This paper aims to implement what is referred to as the collocation of the Arabic keywords approach for extracting formulaic sequences (FSs) in the form of high frequency but semantically regular formulas that are not restricted to any syntactic construction or semantic domain. The study applies several distributional semantic models in order to automatically extract relevant FSs related to Arabic keywords. The data sets used in this experiment are rendered from a new developed corpus-based Arabic wordlist consisting of 5,189 lexical items which represent a variety of modern standard Arabic (MSA) genres and regions, the new wordlist being based on an overlapping frequency based on a comprehensive comparison of four large Arabic corpora with a total size of over 8 billion running words. Empirical n-best precision evaluation methods are used to determine the best association measures (AMs) for extracting high frequency and meaningful FSs. The gold standard reference FSs list was developed in previous studies and manually evaluated against well-established quantitative and qualitative criteria. The results demonstrate that the MI.log_f AM achieved the highest results in extracting significant FSs from the large MSA corpus, while the T-score association measure achieved the worst results.
منابع مشابه
A methodology for the extraction of information about the usage of formulaic expressions in scientific texts
In this paper, we present a methodology for the extraction of formulaic expressions, which goes beyond the mere extraction of candidate patterns. Using a pipeline we are able to extract information about the usage of formulaic expressions automatically from text corpora. According to Biber and Barbieri (2007) formulaic expressions are “important building blocks of discourse in spoken and writte...
متن کاملPresenting an Empirical Correlation for Maximum Sauter Mean Diameter in a Spray Extraction Column
Based on the importance of drops' behavior in liquid-liquid extraction, the maximum sauter mean drop diameter has been investigated and correlated in a counter-current spray extraction column with two chemical systems. Spargers were set of nozzles in all experiments. Studying the effects of several parameters on drops size, some correlations were estimated by the last available version of softw...
متن کاملDeveloping EFL Learners' Oral Proficiency through Animation-based Instruction of English Formulaic Sequences
The current pretest-posttest quasi-experimental study attempts, firstly, to probe the effects of teaching formulaic sequences (FSs) on the second or foreign language (L2) learners' oral proficiency improvement and secondly, to examine whether teaching FSs through different resources (i.e. animation vs. text-based readings) have any differentially influential effects in augmenting L2 l...
متن کاملFormulaic Language in Alzheimer's Disease.
BACKGROUND Studies of productive language in Alzheimer's disease (AD) have focused on formal testing of syntax and semantics but have directed less attention to naturalistic discourse and formulaic language. Clinical observations suggest that individuals with AD retain the ability to produce formulaic language long after other cognitive abilities have deteriorated. AIMS This study quantifies ...
متن کاملThe Comparison of different Procedures for DNA extraction from paraffin-embedded Tissues: A commercial kit and a traditional method based on heating
Abstract Background and objectives: Paraffin-embedded tissues and clinical samples are a valuable resource for molecular genetic studies, but the extraction of high-quality genomic DNA from this tissues is still a problematic issue. In the Present study, the efficiency of two DNA extraction protocols, a commercial kit and a traditional method based on heating and K Proteinase was compared. Mate...
متن کامل